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ABSTRACT 

We introduce a new technique based on artificial neural networks which allows us to 
make accurate predictions for the spectral energy distributions (SEDs) of large samples 
of galaxies, at wavelengths ranging from the far-ultra-violet to the sub-millimetre and 
radio. The neural net is trained to reproduce the SEDs predicted by a hybrid code com- 
prised of the GALFDRM semi- analytical model of galaxy formation, which predicts the 
full star formation and galaxy merger histories, and the GRASIL spectro-photometric 
code, which carries out a self-consistent calculation of the SED, including absorption 
and emission of radiation by dust. Using a small number of galaxy properties pre- 
dicted by GALFORM, the method reproduces the luminosities of galaxies in the majority 
of cases to within 10% of those computed directly using GRASIL. The method performs 
best in the sub-mm and reasonably well in the mid- infrared and the far- ultra- violet. 
The luminosity error introduced by the method has negligible impact on predicted sta- 
tistical distributions, such as luminosity functions or colour distributions of galaxies. 
We use the neural net to predict the overlap between galaxies selected in the rest- 
frame UV and in the observer- frame sub-mm at z = 2. We find that around half of the 
galaxies with a 850/im flux above 5 mJy should have optical magnitudes brighter than 
Rab < 25 mag. However, only 1% of the galaxies selected in the rest-frame UV down 
to Rab < 25 mag should have 850/xm fluxes brighter than 5 mJy. Our technique will 
allow the generation of wide-angle mock catalogues of galaxies selected at rest-frame 
UV or mid- and far-infrared wavelengths. 



1 INTRODUCTION 



Starting with IRAS, and continuing with ISO, the Spitzer 
Space Telescope and AKARI, surveys using space-based in- 
frared telescopes have revealed that many galaxies emit a 
significant fraction of their total luminosity at mid- and far- 
infrared (IR) wavelengths, this emission coming from dust 
grains which have been heated by absorbing optical or ul- 
traviolet (UV) light from stars or AGN. Measurements of 
the integrated extragalactic background light reveal that the 
mid- and far-IR wavelength range contain as much en ergy as 
the ultraviolet and optical parts (jHauser et al.|[l998l ). Dust 
therefore plays a key role in shaping the observational sig- 
nature of the overall global star formation history. Building 
on the success of space-based infrared telescopes as well as 
ground-based sub-mm instruments like the Submillimetre 
Common User Bolometric Array (SCUBA) on the James 
Clerk Maxwell Telescope, a number of new instruments and 
space missions are planned which will map the universe at 
wavelengths sensitive to emission from dust (e.g. Herschel, 



SCUBA-2, LMT, ALMA). Some of these new instruments 
will allow wide field surveys to be carried out, such as the 
Herschel ATLAS survey which will cover 600 square degrees 
and will provide accurate measurements of the clustering 
of galaxies selected by their far IR emission. These new sur- 
veys will be targetted by other telescopes, building up multi- 
wavelength coverage. It is therefore essential to develop the- 
oretical tools which can take advantage of these new data 
and which make predictions of galaxy spectral energy dis- 
tributions (SEDs) over a wide range of wavelengths. 

In this paper we build on a hybrid model introduced by 
Granato et al. (2000) which combines the semi-analytical 
galaxy formation code GALFDRM (Cole et al. 2000) with 
a spectro-photometric code GRASIL (Silva et al. 1998). 
The semi-analytical model uses simple physically motivated 
recipes and prescriptions to follow the baryonic process be- 
lieved to be important for galaxy formation (see Baugh 2006 
for an overview). GRASIL takes the star formation history 
predicted for each model galaxy by GALFDRM and makes an 
accurate calculation of the SED from the far-UV to the ra- 
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dio. GRASIL calculates the absorption of starlight by dust 
self-consistently by radiative transfer, using the dust mass 
obtained from the gas mass and mctallicity, and the disk and 
bulge scalelengths predicted by the model. The spectrum of 
dust emission is calculated by solving for the radiative equi- 
librium temperatures of individual dust grains. The hybrid 
GALFORM plus GRASIL model successfully reproduces the 
abundance of Lyman-broak galaxies at high redshift (de- 
tected through their emission in the rest frame UV) and 
the number counts and redshifts of submillimetre selected 
galaxies (Baugh et al. 2005). This model has also been used 
to predict the number counts and redshift distributions of 
galaxies as measured in the mid and far IR by the Spitzer 
Space Telescope (Lacey et al. 2008). 

To build mock catalogues with information about the 
spatial distribution of galaxies for wide field surveys like the 
Herschel ATLAS, we need to use the hybrid GALFORM plus 
GRASIL model to populate large volume N-body simulations. 
In this paper we use the Millennium Simulation of the evo- 
lution of structure in a cold dark matter universe (Springel 
et al. 2005). The simulation volume is 500/t^^ Mpc on a side 
and contains around 20 million dark matter haloes at the 
present day. To build a mock catalogue for the Herschel AT- 
LAS, which extends to z w 2, would require us to populate 
around 30 snapshots from the Millennium, which would run 
to around 500 million dark matter haloes. The GRASIL code 
takes several minutes to run for each gala^xy, so to process 
on the order of one billion galaxies would take around 100 
years on current large computers. 

In this paper we explore an alternative approach in 
which we train an artificial neural network to mimic the 
calculation of SEDs by GRASIL. We show that it is possible 
to construct a neural net which, starting from a small num- 
ber of galaxy properties which can be readily predicted by 
GALFORM, can produce reasonably accurate predictions of the 
luminosity which would result from a direct calculation with 
GRASIL. We note that a complementary approach in which 
an artificial neural network is trained to speed up part of 
the calculation carried out by GRASIL has been developed 
by Silva et al. (2009, in preparation). 

Here we introduce the neural net technique and apply 
it to study the overlap between Lyman-break galaxies and 
submillimetre selected galaxies. We give a brief overview of 
GALFORM and GRASIL in Section 2 and explain how they are 
combined into a hybrid code to predict the spectral energy 
distributions of galaxies. In Section 3, we give some theo- 
retical background to artificial neural networks. Section 4 is 
devoted to an investigation of the accuracy of the neural net 
in predicting galaxy luminosities for different choices for the 
set-up of the net. We apply the new technique to the pre- 
diction of the luminosity functions of Lyman-break galaxies 
at 2 = 3, mid-IR selected galaxies ai z — 0.5 and submil- 
limetre galaxies at z = 2 in Section 5, where we compare 
the results from the neural net against the direct calcula- 
tions from GRASIL. We show how well the model can predict 
colour distributions in Section 6. In Section 7, we examine 
the overlap between galaxies selected in the rest-frame UV 
and in the observer frame sub-millimeter. Finally, in Section 
8, we present our conclusions. Throughout we assume the 
cosmology of the Millennium simulation with a present-day 
matter density of Om = 0.25 and a cosmological constant of 
Qx = 0.75. 



2 THEORETICAL BACKGROUND I: 

MODELLING THE GALAXY POPULATION 

In this section, for completeness, we give a brief overview 
of the semi-analytical galaxy formation model GALFORM and 
the spectro-photometric code GRASIL. We also explain how 
these codes can be used in combination to predict the full 
spectral energy distributions of a population of galaxies. 

2.1 The galaxy formation model: GALFORM 

The fate of baryons in a universe in which structure in 
the dark matter forms hierarchically depends on a range 
of often complex and nonlinear physical phenomena. The 
GALFORM code models these processes using physically moti- 
vated recipes. Some parts of the model are better understood 
than others. For example, the merger history of dark matter 
haloes has been modelled extensively using N-body simula- 
tions of gravitational instability and accurate Monte Carlo 
techniques have been developed to replicate the merger his- 
tories (e.g. Parkinson, Cole & Helly 2008). On the other- 
hand, the rate at which stars form from a reservoir of cold 
gas is not well understood theoretically and is modelled by 
adopting a prescription which contains parameters. The val- 
ues of the parameters are fixed by requiring that the model 
reproduces a subset of observations of the galaxy population. 
The philosophy behind semi-analytical modelling is set out 
in the review by Baugh (2006). Full details of the GALFORM 
model are given by Cole et al. (2000) and in subsequent 
papers which have presented developments of the original 
model (Benson et al. 2003; Baugh et al. 2005; Bower et al. 
2006; Font et al. 2008). A useful summary of the model 
used in this paper, that of Baugh et al. (2005), is given 
by Lacey et al. (2008). The Baugh et al. model reproduces 
the observed abundance of Lyman-break galaxies and galax- 
ies detected with the SCUBA instrument. We use the ANN 
model to investigate the overlap between these populations 
in Section 7. 

The key point to have clear is that GALFORM predicts 
the full star formation and chemical enrichment history of 
galaxies. The starting point is the merger history of a dark 
matter halo. The rules describing the baryonic physics are 
applied to gas in the merger tree, starting from the branches 
which are in place at the earliest time. The code then fol- 
lows the gas cooling, star formation, feedback processes and 
galaxy mergers. The star formation history, which includes 
the metallicity of the stars made at each timestep, is the 
primary ingredient required to compute the spectral energy 
distribution of a galaxy. In its standard mode of operation, 
GALFORM uses a stellar population synthesis model (such as 
the one devised by Bruzual & Chariot 2003) to construct 
a composite stellar population for each galaxy. Extinction 
of starlight by dust is calculated by assuming that the dust 
and stars are mixed together, rather than by treating the 
dust as a foreground slab. GALFORM predicts the half-mass 
radius of the disk and bulge components of each galaxy (see 
Cole et al. 2000; tests of the model for calculating sizes are 
presented in Cole et al. and also in Almeida et al. 2007 
and Gonzalez et al. 2008). Assuming a random inclination 
angle at which to view the galactic disk, the attenuation of 
starlight is computed using the tabulated results of radiative 
transfer calculations carried out by Ferrara et al. (1999). 
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2.2 The spectro-photometric model: GRASIL 

The GRASIL code (Silva et al. 1998) can be used to accu- 
rately model the observed SEDs of galaxies over a wide 
range of wavelengths - from the far-UV to radio. In the 
standard application of GRASIL, a parameterized star for- 
mation histor y is tuned until a g iven observed SED is re- 
produced (e.g lBressan et al]|2002 ). The unique selling point 
of GRASIL is its sophisticated handling of the extinction and 
reprocessing of starlight by dust. The galaxy as an axially 
symmetric system with a disk and bulge component. The 
dust is assumed to be divided into two phases: a diffuse 
component and dense, star-forming molecular clouds, with 
the mass fraction between the two being a model parame- 
ter. Stars are born within molecular clouds and then escape 
after a few Myr. The extinction of the light from a set of 
stars depends on their age relative to this escape time. High 
mass stars, which typically dominate the emission at ultra- 
violet wavelengths, spend a significant fraction of their short 
lifetimes within the optically thick molecular clouds. Con- 
sequently, the emission at these wavelengths is heavily ex- 
tincted. The time for a star to escape from a molecular cloud 
is a model parameter. GRASIL calculates the radiative trans- 
fer of starlight through this dust distribution (molecular 
clouds and cirrus), and then solves for the temperature dis- 
tribution of the dust grains at each point in the galaxy self- 
consistently based on the local stellar radiation field. This 
temperature distribution is then used to calculate the dust 
emission. Effects of very small grains, subject to temperature 
fluctuations, as well as polycyclic aromatic hydrocarbons 
(PAHs) are included. The model is calibrated against avail- 
able d ata of norm al and st arburst galaxies in the local uni- 
verse llBressan eT al. 2002; Vega et al] l2005l : iPanuzzo et al.l 
l2007l : ISchur~ et al.i,200& 'l . The self-consistent calculation of 
dust temperatures by GRASIL avoids the need to impose a 
dust temperature by hand, as is common in other models. 



2.3 A hybrid model: GALFORM plus GRASIL 

Granato et al. (2000) described how the GRASIL code can be 
used to compute the SEDs of GALFORM galaxies. The semi- 
analytical code predicts the star formation history of each 
galaxy, outputting the star formation rate in all the pro- 
genitors of the galaxy, stored in bins of metallicity. The 
model also outputs the scale lengths of the galaxy's disk 
and bulge, and the mass and metallicity of the cold gas. 
GRASIL takes this information and produces an unextincted 
and an extincted SED for the galaxy. The calculation car- 
ried out by GRASIL improves over the standard calculation 
made by GALFORM in two main areas: i) dust extinction at 
short wavelengths, which is strongly affected by molecular 
clouds, is calculated more accurately; and ii) the emission of 
radiation by dust is included. 



3 ARTIFICIAL NEURAL NETWORKS 

Artificial neural networks (ANNs) are mathematical con- 
structs designed to replicate the behaviour of the human 
brain. Given a training set of observations consisting of in- 
puts with an associated set of outputs, the role of the ANN 
is to "learn" from these observations in order to be able to 



predict the output from a new s et of inputs. T he origins of 
the technique date back to iMcCulloch fc Pittd ((l943.) , who 
developed a simple network using artificial neurons to per- 
form logical operations. However, the concep t of learning 
was only introduced a few years a fter this by HebbI (|l94S ) 



and implemented by iRosenblattI ()l958l . Il962l l. Nowadavs 
ANNs are widely used in computer science, finance, physics, 
mathematics, astronomy and many other areas. Typical ap- 
plications include pattern recognition, function approxima- 
tion, prediction and forecasting, and categorization. Even 
though neural networks have traditionally been viewed as 
black boxes, for which the user has little knowledge of their 
internal workings, they offer a number of advantages over 
other data mining and analysis tools, such as the ability to 
learn and applicability to a wide of problems. Also, ANNs 
can be readily parallelized. 



3.1 Basic concepts 

In simple terms, the brain can be thought of as a collec- 
tion of billions of special cells called neurons, which pro- 
cess information and are interconnected through synapses 
in a complex net. Neurons work by receiving electrochem- 
ical signals from other neurons, some of which will excite 
the cell whereas others will inhibit it. The neuron adds up 
these inputs and if the sum exceeds a certain threshold, it 
will transmit the same signal to other neurons. In this case 
the neuron is said to be activated. 

ANNs are similar to their biological counterparts: they 
consist of simple computational units (also called neurons or 
nodes) , which are connected in a network. For every neuron 
we need to specify the input connections and their associated 
weight, Wj. The neuron multiplies the input by its weight 
and adds the contributions from the interconnected units. 
The sum is then mapped by the activation function, /, to 
the output value, which, in turn, will become an input to 
the next group of adjacent neurons. If we define ik as the 
input signal coming from neuron fc, and Wjk as the weight 
between the input fc and neuron j, then the output, oj , from 
the neuron is given by: 




(1) 



There are numerous types of ANNs which differ in the 
way the neurons are organized and exchange information. It 
is common to group the neurons into layers. In general, there 
is an input layer, an output layer and some number of hidden 
layers in between. The input layer is responsible for handling 
the input data. It is clear that there is no activation function 
associated with this layer, because the output values of their 
neurons are simply set to be equal to their input values. The 
output of the network is recovered from the output layer. 
Using only one input layer and one output layer, it is possible 
to construct a very simple network called a perceptron. The 
perceptron can recognize simple patterns in data. For more 
difhcult tasks, we need hidden layers between the input and 
output layers. The term "hidden" is used because the user 
does not have direct access to the inputs and outputs dealt 
with by these layers. 

Networks with more than just an input and output 
layer are called multilayer networks or multilayer percep- 
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trons. They are the most widely used due to their ability to 
learn nonlinear functions. 

The three most popular network configurations are: the 
perceptron (no hidden layers), the feed- forward and the re- 
current network (the latter two cases both use hidden lay- 
ers). Feed-forward nets are the most widely employed due 
to their simplicity. These ANNs pass information from the 
input layer, through the hidden layers to the output neu- 
rons. In recurrent networks, on the other hand, the output 
from the neurons can be fed backwards, through feedback 
connections, and act as input. Such behaviour is similar to 
that found in the biological brain. Even though recurrent 
networks can perform better than the feed-forward nets, 
they suffer from a major drawback: training is more difficult 
due to their oscillatory, even chaotic behaviour, resulting in 
longer computing times. 



3.2 Training 

There are two types of learning: supervised and unsuper- 
vised. In the first case, the network is presented with a tar- 
get consisting of a set of inputs with associated outputs. 
The ANN adapts its weights in order to reproduce the de- 
sired output. In unsupervised learning, the network does not 
have a target output. In this case, the aim is to find patterns 
and to group the data. In this paper we focus on supervised 
learning. 

There are several different approaches to supervised 
learning. Most share the common feature that the ANN 
learns by comparing the predicted output to the target out- 
put. The algorithm of this process is simple: (i) start with an 
untrained net; (ii) determine the output from a given input; 
(iii) compare the output to the target output and compute 
an error; (iv) adjust the weights in order to reduce the error. 

The most widely used learning algorithm is the 
backpropagation algor ithm, which was introduced by 
iRumelhart et al.l l| 19861 ). This algorithm finds the local min- 
imum of the error function: 



Ok) 



(2) 



where tk represents the desired or target output values, and 
Ok is the predicted output from the neuron. Using the gradi- 
ent descent method, it can be shown that the update to the 
weights from the hidden layer to the output layer is given 
by: 



\OWik J 



(3) 



with rj known as the learning rate, Sj = itj—Oj) f'{ij) (where 
the activation function, /, is differentiable) and is the out- 
put from the preceding hidden neuron. A similar expression 
can be found for the variation of the weights between the 
input and hidden layers. 

It is clear that if the surface corresponding to the er- 
ror function has multiple local minima then this method, as 
originally defined, will only guarantee convergence towards 
one of the minima but not necessarily to the global miin- 
mum. However, this is not an insurmountable problem since, 
during the first steps of the gradient descent, the weights will 
gradually move towards the global minimum. Moreover, in 



the worse case scenario, the weights will converge to a lo- 
cal minimum in the vicinity. There are several methods to 
avoid this behaviour: we could add an extra factor to Eq. |3l 
/3Au)^j. , called the momentum, which has the same direction 
as the previous step change, Aw^j,, and is controlled by the 
coefficient /3. Alternatively, we can train the network sev- 
eral times using the same training sample but with different 
initial random weights. The latter approach is the one we 
will follow in this paper. A furth er refinement is the resi lient 
backpropagation algorithm (Ri edmiller fc BraunI 1199^ ). in 
which instead of adopting the full change in the weights 
specified by Eq. |3l we only use the sign of the derivative 
multiplied by a constant. We also adopt this method in our 
network. The resilient backpropagation algorithm has the 
advantage of being one of the fastest learning algorithms. 

In an ideal situation, the ANN would, of course, find the 
optimal set of weights such that the error function is mini- 
mized. However, there is an important aspect that we have 
to bear in mind. One of the reasons why we use ANNs is that 
we need to achieve generalization; i.e. it is more important 
to find the network that best fits the testing or validation 
set, than it is to find the minimum of the error function for 
the training set. In fact, if the net is overtrained it will start 
to fit the noise associated with the data instead of the un- 
derlying signal. This leads to overfitting and consequently 
may affect the performance of the ANN on the validation 
sample. One of the procedures deployed to avoid overfitting 
is to use a so-called early stop. In this case, the training 
process is terminated when either the error function reaches 
a pre-defined threshold or a maximum number of iterations 
is reached. The latter choice is adopted in this paper. 

Finally, a brief word about the form of the activation 
function. As we saw, the activation function plays an im- 
portant role in the neural network. It allow us to activate 
or deactivate neurons and adds the nonlinearity needed to 
solve complex problems. Evidently, activation functions only 
make sense for the hidden and output layers, not for in- 
put layers. There are many activation functions; in fact, 
any nonlinear function would fit the bill. The most com- 
mon are: the sigmoid function, f(x) — 1/ (l + e~°'^) , where 

Q is the steepness; the Gaussian, f{x) — e ^"^^ ; and the El- 
liot, f{x) = ax/ (1 4- |aa;|). Our default choice is the sigmoid 
function; we contrast the performance of the ANN with this 
activation function against some of the others listed above 
in Section 4.5.3. 



4 APPLICATION OF ANN TO GALFORM 
PLUS GRASIL 

Our main objective is to predict a galaxy's spectral energy 
distribution using a small set of its physical properties as 
predicted by GALFORM . This is far from a simple proposition, 
due to the complexity of the individual spectra and the wide 
range of spectral energy distributions found in a population 
of galaxies. In this section we explain how we use the ANN 
to predict spectra or luminosities, showing the first results 
and discussing some performance issues. 
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4.1 Training and testing samples 

The training process is crucial for ANNs. The better the 
network learns about the characteristics of the training set, 
the better it will perform when predicting the spectra of a 
new set of galaxies. 

The galaxy spectra calculated by GRASIL are far from 
simple. As noted in section 12.11 GRASIL calculates the stel- 
lar emission, dust extinction and dust emission, using the 
star formation and metal enrichment histories predicted by 
GALFORM. As a consequence, GRASIL spectra are complex and 
varied. The SEDs we compute from GRASIL comprise, in our 
application, 456 wavelength bins (this number can be varied 
in GRASIL), so the output has a high dimensionality. Fig. [1] 
shows some examples of spectra produced by GRASIL. In 
the top panel, we plot the spectral energy distribution of 
a randomly selected galaxy (black line), showing the differ- 
ent contributions (extincted starlight, molecular dust clouds 
and diffuse dust). The mid-infrared emission in this partic- 
ular galaxy is dominated by PAH molecular bands and the 
far-infrared by cirrus (diffuse dust) emission. Further exam- 
ples of total galaxy SEDs are shown in the bottom panel of 
this plot. 

Fig. m shows a quantitative view of the complexity of 
the spectra output by GRASIL for a population of galaxies. 
We plot the ratio between the standard deviation and the 
mean of the normalized spectra for a representative sam- 
ple of galaxies. This plot shows that the ultraviolet, mid 
infrared, microwave and the radio regions of the spectrum 
show the most variety in galaxy SEDs. The visible and far- 
infrared parts of the model spectra show, by comparison, 
less variance. 

Each spectrum is composed of 456 flux bins, so our first 
approach will be to set the number of output neurons in 
our net to be 456, one for each flux bin. Later on we will 
try different methods in order to reduce the dimensionality 
and variance of the output space. For use in the ANN, we 
first normalize the total luminosity in each SED to unity. 
We then assign the logarithm of the flux at each wavelength 
to these outputs in order to reduce the dynamic range of the 
training data. 

The selection of the input for the ANN is less straight- 
forward. The natural choice would be to adopt the same in- 
put as used directly by GRASIL to create the spectra, i.e. the 
star formation and metal enrichment histories along with 
the gas mass and metallicity, and the scale-lengths of the 
disk and bulge. However, this is hard to implement due to 
the enormous number of input variables implied (more than 
3000 taking into account the different timesteps and bins of 
metallicity in which the star formation histories are stored). 
This, in turn, would represent a substantial amount of com- 
puting time and complexity for the learning process. To keep 
things simple, we decided to use a small set of galaxy prop- 
erties, measured at the output redshift at which the galaxy's 
SED is required. After some investigation, we found that a 
useful set of galaxy properties to serve as input to the ANN 
is: total stellar mass, stellar metallicity, bolometric luminos- 
ity, circular velocity of the disc measured at the half-mass 
radius, the effective circular velocity of the bulge, disc and 
bulge half-mass radii, V-band luminosity weighted age, V- 
band dust extinction optical depth, metallicity of the cold 
gas, the mass of stars formed in the last burst and the time 
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Figure 1. Example galaxy SEDs, as computed by GRASIL using 
star formation histories predicted by GALFORM. In the top panel, 
we show the different components of a galaxy spectrum: the black 
line shows the total SED, which is the result of adding the ex- 
tincted star light (green short-dashed line), and the emission from 
diffuse cirrus (blue dotted line) and molecular clouds (red long- 
dashed line) emission. The stellar contribution plotted here in- 
cludes emission from dust in the envelopes of AGB stars, and 
also thermal and synchrotron radio emission at long wavelengths. 
Selected examples of total spectra are shown in the bottom panel. 



since the start of the last burst of star formation. (Recall 
bursts are triggered by galaxy mergers or by disks becoming 
dynamically unstable; the latter process does not operate 
in the Baugh et al. model which is used as an example in 
this paper.) We therefore construct an input layer with 12 
galaxy properties. It is important to note th at this set of 
input galaxy properties has been tuned for the lBaueh et al.l 
(2Q05) model at z = 0. For a different model, the the ANN 
might perform better with a different set of galaxy prop- 
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Figure 2. The ratio between tlie standard deviation and 
the mean of GRASIL spectra for a representative sample of 
GALFORM galaxies. The spectra were all normalized by dividing 
by the bolometric luminosity. 
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Figure 3. Four randomly selected examples of galaxy SEDs pre- 
dicted by the ANN (red) compared with the SEDs calculated 
directly by GRASIL (black). 



erties as inputs. We also note that the above list of input 
galaxy properties includes the circular velocities of the disk 
and bulge - these properties affect the SED through their ef- 
fect on the efficiency of supernova feedback and on the star 
formation timescale. 

In this section, the training and testing samples were 
extracted from a large catalogue of galaxies from the Baugh 
et al. m odel at z = fo l lowin g a similar procedure to that 
used by iGranato et al.1 l|2000l '). The GALFORM catalogue is 
sampled to give equal numbers of galaxies in logarithmic 
bins of total stellar massQ This strategy yields 1945 galaxy 
spectra for the training set, each of which is composed of 
456 flux bins, i.e. a 1945 x 456 data array. A further set of 
1898 galaxies were used as a validation sample. 

4.2 Predicting spectra 

To predict luminosities, we use a supervised feed-forward 
neural network composed of 12 neurons in the input layer 
(which correspond to the 12 galaxy properties listed above), 
60 neurons in one hidden layer and 456 neurons in the output 
layer (which are set to be equal to the logarithm of the 
flux in each of the spectrum bins) . The ANN architecture is 
therefore 12:60:456. 

Unless otherwise specified, the following procedures and 
parameters were chosen: (i) in order to deal with the differ- 
ent ranges of the input and output properties, we subtract 
the mean (computed over the training sample) from each 
input and output and divide by the respective standard de- 
viation; (ii) we adopt a sigmoid activation function; (iii) the 

^ In a later section, we will use also galaxies which have an on- 
going burst or which recently experienced a burst. In this case, 
the sample is constructed using logarithmic bins of burst mass 
instead. 



maximum number of training epochs is set to 5000 (i.e. this 
is the criteria used to stop the training process); (iv) in or- 
der to guarantee convergence towards the global minimum of 
the error function (see previous section), we train the ANN 
ten times using different initial random weights, and select 
the one that gives the smallest root mean square logarith- 
mic error (see definition below) for the validation sample. 
Later in this section, we will show how the results change 
on modifying the ANN parameters. 

In Fig. [3] we plot four randomly selected examples of 
the spectra predicted by the ANN and compare these with 
the original spectra. Fig. [3] shows that even without further 
optimization, the spectra predicted using the ANN agree, on 
the whole, reasonably well with the original spectra, partic- 
ularly at visible and near-infrared wavelengths. However, in 
certain wavelength ranges, some galaxies exhibit predicted 
luminosities which differ by more than an order of magni- 
tude from their true values. 

To gain a more quantitative feel for the performance of 
the ANN, we plot in Fig.|3]the ratio of the predicted to orig- 
inal luminosity for selected, representative wavelength bins: 
the FOCA (the Focal Corrector Anastigmat balloon borne 
camera) 0.2 ^m, B (0.44 ^m), IRAC (the Infra Red Array 
Camera on Spitzer) 8 and SCUBA (Submillimetre Com- 
mon User Bolometric Array) 850 /im bands. The statistics 
of the distributions are also summarized in Table [T] Here 
the root mean squared logarithmic error is defined as: 



£L = Y 1 /n [In (Lprcdict cd /I/original ) ] ^ , (4) 

and P|e|<io% gives the percentage of galaxies with predicted 
luminosities which lie within 10% of the true values. Note 
that El has a similar form to the error function which the 
ANN attempts to minimize, as given in Eq. 2. 

Fig. U] shows that overall there is reasonable agreement 
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Figure 4. The logarithm of the ratio of the luminosity predicted by the ANN to the true luminosity, at selected wavelengths, for the 
case in which the ANN predicts the full SED. From top to bottom, left to right the panels show this ratio for the FOCA 0.2 fim, B 
(0.44 ^J,m), IRAC 8 urn and SCUBA 850 pim bands. The solid, short and long dashed lines show the median, the 33'''* - 66"^ and S"' - 95*'' 
percentiles of the distribution, respectively. 



Band 




-P|e|<10% 


Pi 


Qi 


Q2 


Qs 


P99 


Bolometric 


0.19 


73.8 


-33.2 


-4.2 


0.2 


5.9 


28.4 


FOCA 0.2 /im 


0.22 


50.7 


-71.3 


-9.8 


-0.5 


9.9 


53.4 


B (0.44 /im) 


0.12 


75.8 


-38.3 


-5.1 


0.2 


5.6 


31.6 


IRAC 8/im 


0.15 


79.7 


-48.7 


-3.7 


0.7 


4.5 


34.1 


SCUBA 850 iJ,m. 


0.32 


59.8 


-177.1 


-7.5 


1.6 


8.6 


54.1 



Table 1. Summary statistics for the distribution of the error on the spectra predicted by the ANN, when using the entire spectrum as 
the ANN output layer. We give errors on the predicted bolometric luminosity and for four different bands; FOCA 0.2 fjm, B (0.44 /im), 
IRAC 8/im and SCUBA 850 ^im. is the root mean square error given by Eq. 4. P\e\<io% shows the percentage of galaxies with 
predicted luminosities within 10% of the true values, pi, Qi, Q2, Q3, P99, give the 1'** percentile, 1^* quaxtile, median, 3'^'' quartile and 
99"^ percentile of the error distribution, respectively. 
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between the predicted and original luminosities in the anal- 
ysed bands. For most galaxies, the predicted luminosity lies 
within 20% of the original luminosity. The performance of 
the ANN is wavelength dependent, with the results for the 
B and infrared bands being better than those for the ultra- 
violet and submillimetre bands. This is mainly due to the 
increased variance of the spectra in these later regions (see 
Fig. [Hi. We obtain a value of £l = 0.22 in the UV 0.2 ^m, 
£l = 0.12 in the B band, el = 0.15 in the near-infrared 
and El = 0.32 in the SCUBA-850 /xm band. Also, the error 
has a tendency to increase with luminosity, as revealed by 
the broadening of the 5**^ — 95"^ percentile range of the dis- 
tribution. This suggests that the ANN has more difficulty 
dealing with bright galaxies. This could be due to the greater 
complexity of the star formation histories of these galaxies, 
with the mechanism invoked to suppress the formation of 
bright galaxies playing an increasingly important role for 
more luminous galaxies (i.e. superwind feedback in the case 
of Baugh et al. 2005). This in turn will induce an increase 
in the variety of spectra produced by GRASIL. 

In summary, this first attempt to predict spectra from 
a given set of galaxy properties, using artificial neural net- 
works, has proven to work quite well. For most of the spec- 
tral range, we find that around 75% of the predicted spectra 
deviate by 10% or less from the true spectrum. However, 
there is a considerable error associated with this method, 
particularly for those wavelengths where the dust emission 
dominates over the stellar emission. In such cases, we found 
a high value of the statistic el and a reduction in the per- 
centage of galaxies with predicted luminosities within 10% 
of the true value. 



4.3 Incorporating Principal Components Analysis 
of the spectra 

A simple way to speed up the ANN and to potentially boost 
its accuracy is to reduce the dimensionality of the output, 
which in our case is the number of bins used to describe 
the spectrum. The reason for this is clear: the ANN should 
converge more rapidly to a trained network, because there 
are fewer weights to be adjusted. The dimensionality of the 
spectra can be reduced by using a principal components 
analysis (PCA). The PCA works by finding patterns in the 
dataset, producing a new set of linear, orthogonal basis vec- 
tors, which describe the directions of maximum variance. 
Hence the spectra can be represented, to a high level of ac- 
curacy, by a small number (e.g. around 10) of basis vectors 
(each of which is 456 wavelength bins long). 

The starting point of the PCA is the dataset of n galaxy 
spectra (again normalized to unit total luminosity and taken 
as logs), each of which is described by a m-dimensional vec- 
tor, X (in our case m — 456). The data sample consists of 
n X m data points. The first step of the PCA is to subtract 
the mean from all the data dimensions, such that the mean 
of the m data vectors is zero: 

~ ~ (5) 

where i — l,n and j = 1, m, and the mean is 



We then compute the covariance matrix 

n 



where the variances of each variable are given by: 



n ^ — ' 



(7) 



(8) 



To find the axes of maximum variance we find the eigenvec- 
tors and eigenvalues of the covariance matrix: 

Ce,=Ae,. (9) 

Note that we have assumed that the data set can be repre- 
sented by a linear combination of the new eigenvectors. 

The next step is to sort the eigenvectors in order of de- 
creasing eigenvalue, which corresponds to decreasing vari- 
ance. This is when the reduction of dimensionality is made. 
We can decide how many eigenvectors to retain, based on 
how many eigenvectors we think are sufficient to describe 
the original data to some desired level of accuracy. We keep 
the eigenvectors with the largest eigenvalues, which corre- 
spond to the axes along which the variance is highest. Once 
we have selected p eigenvectors, we are ready to project our 
original data onto the new basis thereby retrieving the prin- 
cipal components (PCs) of the spectra: 



(10) 



To go back to the original basis, from p ^ m eigenvectors, 
and to (partially) reconstruct the original data, we use: 



^ AE„ + X. 



(11) 



1 



(6) 



If the entire set of principal components is used, no informa- 
tion is lost and the full spectrum can be recovered. With 20 
principal components, however, we find that we can extract 
99% of spectral information. 

The implementation of this method in the ANN frame- 
work is straightforward. We use the same input layer as in 
Section 14.21 i.e. 12 neurons corresponding to the selected 
galaxy properties, and 60 neurons will be used in the hid- 
den layer. The number of output neurons is defined by the 
number of principal components we use. Once we establish 
this number, and having calculated the principal compo- 
nents and eigenvectors of the training sample, we use the 
ANN to predict the principal components instead of the full 
spectrum. The network architecture in the case of p PCs 
is 12:60:p. The final step is to reconstruct the full spec- 
trum from the predicted principal components, using Ea. llll 
where Ep and x are the eigenvectors and mean of the train- 
ing sample, and A is the predicted PCs. We use the same 
procedures and ANN parameters as defined in the previous 
subsection. 

The statistics of the error distribution for this method 
are presented in Table [2l Using the principal component 
decomposition instead of the full spectrum does not lead 
to a dramatic improvement in the accuracy of the ANN. 
The results achieved with this technique are similar to those 
in the previous subsection, with, perhaps, a slight improve- 
ment: approximately 80% of the predicted spectra now show 
bolometric luminosities which lie within 10% of the original 
value, and el = 0.19. Similar results are obtained for the 
different bands. Interestingly, both methods show similar 
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Band 




■P|e|<10% 


Pi 


Qi 


Q2 


Q3 


P99 


Bolometric 


0.19 


78.3 


-33.8 


-4.2 


0.5 


5.4 


27.7 


0.2 /im 


0.20 


58.4 


-64.2 


-8.0 


0.4 


8.4 


47.1 


B (0.44 fi m) 


0.11 


74.5 


-31.6 


-4.2 


0.6 


5.2 


28.3 


8 fim 


0.16 


78.7 


-80.2 


-3.9 


0.5 


4.3 


30.7 


850 Mm 


0.32 


63.1 


-143.1 


-6.8 


0.2 


7.0 


57.7 



Table 2. Summary statistics for the error distribution of the spectra predicted by the ANN when using 20 principal components as the 
output layer. Statistics are quoted for the bolometric luminosity and four different bands: FOCA 0.2 ^m, B, IRAC 8 fj,m and SCUBA 
850 fim. The description of the various quantities is given in Table [T] 



Number of 


Bol. EL 


-P|<:|<10% 


Principal Components 






1 


0.79 


16.5 


3 


0.27 


27.8 


5 


0.22 


62.2 


10 


0.20 


76.6 


20 


0.19 


78.3 


50 


0.18 


77.7 


100 


0.19 


79.2 


456 


0.20 


81.1 



Table 3. Summary statistics of the distribution of the predicted 
bolometric luminosity, for ANNs using different number of prin- 
cipal components (see Table[T]for a description of the quantities). 



difficulties when predicting the spectra, i.e. if a particular 
spectrum was not forecasted accurately by the method de- 
scribed in the previous subsection, then it is likely that it will 
differ substantially from its original value when the principal 
components are used to describe the spectrum. This is ex- 
pected because with 20 PCs we only lose 1% of the spectral 
information. 

The main advantage of compressing the spectra using 
PCA is the reduced computing time compared with using 
the full spectrum: the time required is reduced by a factor 
of ~ 15. 

Table[3]shows the errors associated with the bolometric 
luminosities of predicted spectra, using ANNs which repre- 
sent the spectra with different numbers of principal com- 
ponents. The ANN predictions using PCA do not signifi- 
cantly improve when using more than 10 PCs, as indicated 
by the percentage of spectra within 10% of error, which is 
approximately constant at « 80%. We remind the reader 
that most of the wavelength range of the spectrum can be 
reconstructed with 10 or more principal components. Hence 
we expect the ANN to behave in a similar way as it does 
when using the full spectrum. 

4.4 Predicting the luminosity in a single band 

As we saw in the previous subsection, using just a few prin- 
cipal components of the spectrum, instead of the full 456 
flux bins, facilitates the training process due to the reduced 
number of internal ANN weights which need to be adjusted. 
However, the gain in accuracy is marginal. In this section 
we explore another possible route to improve the accuracy 
of the ANN: the prediction of the luminosity in a single band 
instead of the full spectrum. The ANN becomes simpler in 



the sense that we only need to predict one variable, the 
band-pass luminosity. We would naturally expect the ANN 
to perform better for one band than in the case of trying 
to predict the full spectrum, as the power of the ANN is 
focused over a narrow range of wavelength. However, there 
is one drawback. If we require the luminosity in a range of 
bands, then with this approach we would need to train the 
ANN for each band in turn0 

To predict band luminosities we need to preprocess 
the training set spectra to calculate luminosities in a pre- 
defined set of bands (in this section we use the follow- 
ing bands: FOCA 0.2 /im, B (0.44 /.tm), IRAC 8^im and 
SCUBA 850 fim). We will start with a network configuration 
of 12:60:1, i.e. 12 neurons in the input layer, one hidden layer 
with 60 neurons each and 1 output neuron corresponding to 
the desired luminosity. The ANN is trained separately for 
each of the selected bands, using procedures and parameters 
similar to those used in Section [4.21 The results are shown 
in Fig. [S] and the distribution of the predicted luminosity is 
given in Table 

The approach of training the ANN to predict one band 
at a time performs much better than the previous neural 
nets. The proportion of galaxies with predicted luminosities 
which are within 10% of the true luminosity is significantly 
higher than before, 88% compared with ~ 68%, when the 
full spectra were used as the output. A similar improvement 
is seen in the root mean square logarithmic error. The results 
are particularly impressive for the B-band, where el ~ 0.07 
and more than 95% of the population have predicted lu- 
minosities within 10% of the original. As with the previ- 
ous ANN, the results in the UV and infrared/submillimetre 
bands the ANN are not as good as in the B-band, due to the 
increased variety in the model spectra at these wavelengths. 
Nevertheless, the performance of the ANN is still markedly 
better than before even at these wavelengths (see Fig. ^ . It 
is also apparent from this plot that the scatter around the 
desired relation increases slightly with the luminosity of the 
galaxy. No correlation was found between the errors and the 
other galaxy properties at the output redshift. 

As mentioned earlier, there is also the possibility of us- 
ing the ANN to predict n luminosities at a time. The advan- 
tage of this is that we would only need to train the network 
once, instead of having to train it n times, once for each 
band. Table[5]lists the statistics of the error distribution us- 
ing this procedure. The ANN is trained using four neurons in 



Note that later on we explore the performance of the ANN 
when predicting more than one band at a time. 
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l°g k™.. (10"° h-= erg/s/Hz) log L (lO"" h-= erg/s/Hz) 



Figure 5. The logarithm of the ratio of predicted to true luminosity, using the ANN applied to the prediction of a single band. From 
left to right, top to bottom we plot the ratios for the FOCA 0.2 /xm, B, IRAC 8 /xm and SCUBA 850 /xm bands. The solid and dashed 
lines have the same meaning as in Fig. |4] 
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-3.4 
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B (0.44 m) 


0.07 


95.3 
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16.5 


8 lira 


0.16 


88.5 
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-2.2 
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0.24 


89.1 


-66.1 


-2.5 


0.0 
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29.5 



Table 4. Summary statistics for the distribution of the error in the luminosities (FOCA 0.2 /^m, B, IRAC 8 ^tm and SCUBA 850 /^m) 
predicted by the ANN, when using one output neuron. See description of quantities in Tabled 
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Pi 


Qi 
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Qs 


P99 
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0.15 


79.5 


-34.5 


-3.8 


0.1 


3.5 


42.4 


B (0.44 fM m) 


0.08 


95.2 


-15.3 


-1.1 


0.1 


1.3 


17.7 


8 /im 


0.13 


88.9 


-28.8 


-2.6 


-0.2 


2.4 


24.3 


850^4111 


0.25 


80.7 


-68.1 


-4.1 


0.0 


3.6 


50.1 



Table 5. Summary statistics for the distribution of the error on the luminosities predicted by the ANN, using four output neurons: 
FOCA 0.2 /im, B, IRAC 8Aim and SCUBA 850 /^m. For a description of the quantities see Table [T] 



the output layer, one for each band (FOCA 0.2 /xm, B, IRAC 
8 /im and SCUBA 850 [im). This variant method performs 
slightly worse than the case in which the ANN is trained 
four times, once for each of the luminosity bands, with a 
percentage of 80%, 95%, 89% and 81% of galaxies with pre- 
dicted luminosities within 10% of the true luminosity, in the 
FOCA 0.2 /im, B, IRAC ^^im and SCUBA 850 /.tm bands, 
respectively. Nevertheless, it still outperforms the first two 
methods that we explored in which the full spectrum was 
predicted. 

Unless otherwise stated, from now on we focus on the 
predictions of the best performing method as described at 
the beginning of this subsection, namely using the ANN to 
predict the luminosity in one band at a time. 

4.5 Performance for different ANN choices 

In this section we explore the architecture of the ANN, along 
with some of the parameter space of the ANN, the sample 
extraction and the effect of using different redshifts. 

J^.d.l Architectures 

So far, we have used a supervised feed-forward neural net- 
work with 12 input neurons, one hidden layer with 60 neu- 
rons and an output layer with one or more neurons, depend- 
ing on the method under consideration. The number of input 
and output neurons are effectively determined by the setup 
of the problem. On the other hand, the number of hidden 
layers and the number of neurons each contains are param- 
eters that are more subjective. Currently, there is no clear 
consensus on how ma ny hidden units should be used (see 
IScarselh fc TsoHllQQSh . Each application will have its opti- 
mal set of parameters, which can only be found by trial and 
error. In Fig[Sl we show the evolution of the percentage of 
galaxies with predicted luminosities which lie within 10% of 
the true luminosity, as a function of the number of neurons 
used in the hidden layer of the ANN. Each curve shows the 
performance for a different filter. The training process for all 
configurations was stopped after 5000 epochs. We see that 
there is little variation in the performance of the network 
for architectures with more than 20-30 neurons in the hid- 
den layer. For two hidden layers, the results are similar to 
those found for one layer, for the same total number of neu- 
rons. However, in some cases, for example the FOCA 0.2 /xm 
band, the use of two hidden layers seems to help slightly in 
terms of the P|<;|<io%, which changes from 80.4% to 82.5%, 
for one and two layers respectively (there is also a reduction 
of the SCUBA 850 pim ei, by a factor of « 2, when two layers 
are used) . The use of more than two layers does not improve 



100 I 



a, 




20 40 60 80 100 

Number of neurons in hidden layer 



Figure 6. The evolution of the percentage of galaxies with pre- 
dicted luminosities which lie within 10% of the true luminosity, 
as a function of the number of neurons in the hidden layer. For all 
configurations tested, we stopped the training process after 5000 
epochs. The violet, blue, red and green lines show the results for 
the FOCA 0.2 /^m, B, IRAC 8 ^im and SCUBA 850 jixa bands, 
respectively. 



the results further. Henceforth, we will use a configuration 
of 12:30:30:1 throughout this paper. 



Number of training epochs 

As explained in Section [31 two methods can be used to stop 
the training process in order to avoid overfitting: applying a 
pre-defined error threshold or setting a maximum number of 
epochs. In this paper, we use a maximum number of epochs, 
5000, as the criteria for early stopping. In Fig. [7] we show 
how the percentage of galaxies with predicted luminosities 
within 10% of the true luminosity (P|e|<iD%) depends on 
the number of epochs, when using a network configuration 
of 12:30:30:1. All the four networks show rapid convergence: 
after 1000 iterations, P|e|<io% in the FOCA 0.2 fj.m, B, IRAC 
8 /im and SCUBA 850 /xm bands are 79%, 94%, 88% and 
87%, respectively. The plot shows that, after 5000 epochs, 
the networks have already converged to their optimal states, 
after which there is no noticeable change in the P|e|<io%. 
The value of 5000 epochs was kept as the early stop param- 
eter for the training of the ANN. 
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0.14 


78.6 
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87.5 


0.23 


79.6 


Sigmoid 


0.14 


80.1 


0.07 




95.3 


0.16 


88.5 


0.24 


89.1 


Gaussian 


0.16 


77.9 


0.07 




94.6 


0.14 


87.7 


0.27 


74.6 
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0.55 


20.5 


0.22 




49.7 


0.39 


22.0 


1.08 


11.0 



Table 6. The performance of neural nets with different hidden layer activation functions; elliot, sigmoid, gaussian and linear (see Table[T] 
for a description of the statistical quantities). 
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Figure 7. The dependence of the percentage of galaxies with pre- 
dicted luminosities within 10% of the true luminosity, f'|e|<io%i 
on the number of training epochs, for an ANN configuration of 
12:30:30:1. The violet, blue, red and green lines show the results 
for the FOCA 0.2 /jm, B, IRAC 8 and SCUBA 850 ftm bands, 
respectively. The dotted line shows 5000 training epochs, for ref- 
erence. 

4-5.3 Choice of activation function 

Activation functions are needed in order to add nonlin- 
earity to the training process. So far, for the hidden neu- 
rons, we have been using the sigmoid function, given by: 
fix) — 1/ (l -I- e~°'^^, where the coefficient a is commonly 
referred to as the steepness of the activation function; the 
steepness of choice is q = 0.02. For the output neurons, the 
linear activation function, f{x) — ax, was adopted. 

In Table |6] we quote the performance of ANNs with 
different activation functions for the hidden layers for the 
FOCA 0.2 /xm, B, IRAC 8 /xm and SCUBA 850 /im bands. 
As expected, the best results are achieved with the nonlinear 
functions. The sigmoid activation function seems to slightly 
outperform the elliot and gaussian ones. A linear activation 
function should have the effect of removing all nonlinear- 
ity from the neural network training, and consequently the 
ANN will be no more than a simple perceptron (see Sec- 
tion [S]). As is clear from Table |6] the capabilities of neural 
network are greatly reduced when we use only linear correla- 
tions. The errors associated with the predicted luminosities 



increase, which leads to merely 11% of galaxies with errors 
smaller than 10% in the SCUBA 850 /im band. Similarly 
poor results are obtained in the other bands. 

In view of these results, we will use the sigmoid activa- 
tion function throughout this paper. 

4.6 ANN Performance: Normal and burst galaxy 
samples 

Until now, we have been using a sample extracted from the 
GALFDRM catalogue following a similar procedure to that out- 
lined in lCranato et al.] (|2000| ) for "normal galaxies" (see def- 
inition below). In this subsection, we distinguish between 
quiescent and burst galaxies, and analyse the performance 
of the ANN in both cases. Quiescent and burst galaxies 
are sampled differently in the Baugh ot al. (2005) model at 
2 = 0. It is rare to catch a galaxy undergoing a starburst, so 
it is necessary to sample the bursts carefully to build up a 
statistical sample. In the present work, we do not calculate 
the burst spectrum for a fi xed set of t i mes a fter the start 
of the burst, as done in Gr anato et all lj2000t l. Instead, we 
enlarge the burst sample by simply increasing the volume of 
the simulation run in GALFDRM. 

A galaxy is considered to be a burst galaxy, i.e. an on- 
going burst or a recent burst in which the stars formed in 
the burst still have an impact on the spectral energy dis- 
tribution, if tburst ^ 10 Te, whcrc tburst IS the time since 
the start of the most recent burst, and Te is the effective 
e-folding time for the starburst (we assume that the burst 
terminates after 3 e-folds). In Table [7] we show the perfor- 
mance of the ANN when applied to the quiescent and the 
burst samples separately. Here, the "normal sample" repre- 
sents the sample extracted from the GALFDRM catalogue using 
the procedure described previously, selecting equal numbers 
of galaxies in logarithmic bins of total stellar mass. This 
selection also picks a small fraction of galaxies that are un- 
dergoing a burst. The quiescent sample is a "bursts-clean" 
version of the normal sample, i.e. a normal sample for which 
we selected galaxies with fburst > 10 Te or which had no 
burst in their history. The burst sample was extracted dif- 
ferently, selecting equal numbers of galaxies in logarithmic 
bins of mass of stars formed in the most recent burst of star 
formation. Table [7] clearly shows the difficulty experienced 
by the network in predicting the spectra of burst galaxies, 
which is mainly due to the large variety in their spectra. 
When we train the ANN using the normal sample and ap- 
ply this to predict the burst sample, the accuracy of the 
ANN is greatly reduced. The four-band P|e|<io% average, in 
this case, drops from 88% to 17%, mainly due to submillime- 
tre band. In order to improve these results, we constructed 
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97.1 


0.08 


89.1 


0.07 


94.8 


Burst 


Burst 


0.22 


59.9 


0.07 


88.7 


0.15 


63.9 


0.47 


79.1 



Table 7. The performance of several neural networks, trained with different galaxy samples: "normal", "quiescent" and "bursts". See 
text for further details, and also Tabled for a description of the quantities listed). 



separate quiescent and burst samples as described above. 
Training and testing the network independently for these 
two samples produces better results. For quiescent galaxies, 
El < 0.12 for the four bands studied, with equally impressive 
results for the distribution of errors (~ 90% of the galaxies 
show predicted luminosities with less than 10% error). The 
results for bursts galaxies are somewhat less impressive, par- 
ticularly for the FOCA 0.2 fim and IRAC 8 fim bands, where 
P\e\<io% ~ 60 — 70%. However, they clearly outperform the 
ANN trained using the normal sample, with a four-band 
^'|e|<io% average of 73%. Henceforth, we shall train the ANN 
for burst and quiescent galaxy samples separately. 



4-6.1 ANN Performance: different output redshifts 

It is necessary to analyse how the trained ANN performs 
at different redshifts. If it turned out to be the case that 
an ANN trained at one redshift performed equally well at 
other redshifts, then there would be no need to retrain the 
network to predict galaxy luminosities at different redshifts, 
thus saving computing time. So far in this paper, we have 
analysed the ANN predictions only at redshift z — 0. Table[8] 
compares the root mean square logarithmic error and the 
percentage of galaxies with predicted luminosities (in the 
rest frame) that lie within 10% of the original values, for 
normal samples (see definition above) at redshifts 2 = and 
z = 2. The network trained at 2 = performs reasonably 
well when used to predict the luminosities at 2 = 2: it is 
capable of reproducing ~ 52% of the luminosities within 
an error of 10%, and el smaller than ~ 0.6 at wavelengths 
of 0.2 ^m, 0.44/im (B-band), 8 /im and 850 /im. However, 
it is strongly advisable to train the ANN at the redshift of 
choice, as indicated by the table. If we train the net at 2 = 2 
instead and apply it at 2 = 2, the forecasted luminosities 
are much more accurate (we achieve a four-band P|ej<io% 
average of 87%). Our approach will be to train the ANN at 
each redshift for which it is applied. 



4.7 Error analysis 

Previously, we tested the performance of the ANN when 
applied to predicting the quiescent and burst samples sep- 
arately, and found that the errors associated with the pre- 
diction of the burst sample are larger than those for the 
quiescent galaxies. The percentage of accurately predicted 
luminosities was particularly low at UV wavelengths (ap- 
proximately 60%). We now analyse further the error distri- 
bution associated with the predicted FOCA 0.2 /xm band 
for the burst sample. 



In Fig. [8] we plot the percentage error of the predicted 
luminosities as a function of their true, expected, values. As 
presented in Table. [T] we find that ~ 60% have predicted lu- 
minosities within 10% of the true values. The plot shows that 
the errors do not seem to be correlated with true UV lumi- 
nosities. In fact, we find a weak correlation coefficient of 0.12. 
The independence of the errors from the luminosity will be 
an important factor when calculating luminosity-dependent 
quantities (e.g. luminosity functions) and sampling using lu- 
minosity. 

In an effort to further reduce the error associated with 
the predicted luminosities, we also investigated the relation 
between the errors and various galaxy properties for the 
burst sample. In Fig.|5]we plot the logarithm of the predicted 
to true luminosity against: bolometric luminosity (logLboi), 
central V-band extinction optical depth for the burst com- 
ponent (ry^'^^^), disc and bulge size (rjisc/buigc) and circular 
velocities (wdisc/buigc), bulge-to-total mass ratio (B/T), disc, 
bulge and total stellar mass (A^disc/buigc/tot)i stellar metal- 
licity (^t*ot) a-nd the metallicity of the cold gas in the burst 
(•^coid'')) tli6 mass of stars formed in the last burst (Mburst) 
and the mass of host halo (Mhaio). The plot reveals no clear 
correlation between the error associated with the predicted 
0.2 ^m luminosities and any galaxy property. The absolute 
value of the correlation coefficient is smaller than 0.05 for 
most of the properties, with slightly higher values (~ 0.12) 
found for disc size and stellar mass. This implies that any 
sample built using the ANN method will have errors which 
are decoupled from the galaxy properties. Different network 
configurations were tried in order to improve the perfor- 
mance, but without any notable success. 



5 PREDICTING LUMINOSITY FUNCTIONS 

We are now ready to predict accurate luminosities for much 
larger GALFORH samples. In the previous section, we pre- 
sented the first results from the new ANN technique. We 
calculated the luminosities for a sample of GALFORM galaxies 
in the FOCA 0.2 /xm, B (0.44 pm), IRAC 8 pm and SCUBA 
850 /im bands, and found that for more of than 80% of the 
galaxies the predicted luminosity was within 10% of the true 
luminosity. In this section, we show the impact of the error 
in the ANN predictions on the form of the luminosity func- 
tion. It is important to closely reproduce the original model 
luminosity function, because this quantity is the most ba- 
sic statistical description of a galaxy population. Here, we 
present predictions for the luminosity functions of Lyman- 
break galaxies at 2 = 3, for galaxies selected in the mid 
infra-red at 2 = 0.5, and SCUBA galaxies at 2 = 2. 
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0.14 
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0.07 


95.3 
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88.5 


0.24 


89.1 


2 = 


2 = 2 


0.62 


58.5 


0.36 


63.4 


0.50 


53.1 


0.56 


34.1 


2 = 2 


2 = 2 


0.29 


80.9 


0.17 


89.7 


0.14 


86.1 


0.26 


89.3 



Table 8. The performance of an ANN trained with galaxy samples extracted at different redshifts: 2 = and 2 = 2. See Tabled for a 
description of the quantities). 
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Figure 8. The distribution of the error associated with the 0.2 fj,m luminosities predicted by ANN, for a sample of burst galaxies at 
2 = 0. In the left panel, the solid and dashed lines have the same meaning as in Fig. |4] In the right panel, the range of the x-axis is set 
to the 1st and 99th percentiles of the distribution, and the histogram is normalized to give ~ ^• 



Throughout this section we use a 12:30:30:1 ANN ar- 
chitecture, which corresponds to 12 galaxy properties as the 
input layer, 30 neurons in each of the two hidden layers and 
one output neuron representing the galaxy luminosity in a 
given band (see the previous section for further details) . The 
sigmoid function was adopted as the activation function of 
choice for the hidden neurons. Driven by the conclusions of 
the previous section, we split our sample into two: quiescent 
and burst galaxies, and train the network for each of these 
populations at the selected redshift. 



5.1 Lyman- break Galaxies at 2 = 3 

Lyman-break galaxies (LBGs) were the first significant high 
redshift galaxy population to be isolated, and were used to 
measur e the star formation history of the Universe at early 
epochs jSteidel et al.|[l999l ). These high redshift galaxies are 
selected from photometry in several optical bands which 
straddle the Lyman-break at 912 A in a galaxy's rest frame 
for objects in the target redshift range (z > 2). 

In this section we use the ANN to predict the luminosi- 
ties of 2 = 3 galaxies at a rest frame UV wavelength of 0.17 
/im, which corresponds to the observer frame R band at this 
redshift. We compare the luminosity function of the train- 



ing set as computed using GRASIL spectra with that obtained 
using the luminosities predicted by the ANN. 

In Fig. [TU]we plot the logarithmic error associated with 
the ANN predicted UV luminosities against the original lu- 
minosities (the target values), for quiescent (top-left panel) 
and burst (top-right panel) galaxies. In the lower panels we 
plot the error distribution for both samples. The statistics 
of these distributions are summarized in Table |9] In the U V 
regime, the ANN works better for quiescent galaxies than 
for burst galaxies. For most of the quiescent galaxies, 88% 
of the sample, the ANN predicts luminosities within 10% of 
the true value, and only ~4% of galaxies have errors larger 
than 20%. Burst galaxies show somewhat a bigger scatter 
around the expected values, which is driven by the higher 
intrinsic variety in their spectra at 0.17 /xm. The value of el 
is higher for bursts, 0.40. As noted in the previous section. 
Fig. [To] shows that the error distribution is approximately 
independence of the luminosity. 

The rest frame 0.17 /im luminosity functions at z = 3 
are plotted in Fig. 1111 The luminosity function constructed 
using the luminosities predicted by the ANN (dashed 
lines) is in excellent agreement with the "true" luminos- 
ity function which uses the luminosities calculated directly 
with GRASIL (solid lines). This agreement holds for quies- 
cent and burst galaxies separately, in spite of the somewhat 
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Figure 9. The relation between the logarithmic error associated with the A = 0.2 /im luminosities predicted by the ANN and various 
galaxy properties, for the sample of burst galaxies. From left to right and top to bottom we consider: bolometric luminosity, central 
V-band extinction optical depth, disc and bulge size, disc and bulge circular velocity, bulge-to-total mass ratio, disc, bulge and total 
stellar mass, stellar metallicity and metallicity of the cold gas in the burst, mass of stars formed in the last burst and mass of host halo. 
The meaning of the dashed lines is the same as in Fig.|4] The zero error line is plotted for reference (dotted line). 



Sample 




P\e\<10% 


Pi 


Qi 


Q2 


Qa 


P99 


Quiescent 


0.20 


88.4 


-72.1 


-1.0 


0.0 


0.7 


41.1 


Burst 


0.40 


54.2 


-329.7 


-9.9 


-0.1 


9.5 


67.6 



Table 9. Statistics for the error distribution of the rest frame UV luminosities predicted by the ANN, for quiescent and burst galaxies 
at 2 = 3. The listed quantities are described in Table [l] 



larger errors in the case of burst luminosities. We carried out 
an independent test of this comparison, by perturbing the 
GRASIL luminosities by the error distribution of the ANN 
predictions and reached the same conclusion. At bright lu- 
minosities, the true luminosity function is closer to a power- 
law than an exponential break, and so the errors would need 
to be much larger before they would lead to an appreciable 
difference in the luminosity function predicted by the ANN. 
The lack of a dependence of the size of the error on UV 



luminosity also helps to keep the shape of the luminosity 
function (see Fig. llOp predicted by the ANN close to the 
true one. 



5.2 Mid-Infrared Galaxies at z = 0.5 

The development of new space-based infrared telescopes in 
the 1980s opened up a new window in the electromagnetic 
spectrum, allowing us to see galaxies which are heavily ob- 
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Figure 10. A comparison between the ANN predicted rest frame UV (A = 0.17/im) luminosities and the original luminosities, as 
extracted from GRASIL model spectra. The left and right panels show the results for quiescent and burst galaxies, respectively. The solid 
and dashed lines in the upper panels have the same meaning as in Fig. |4] The distribution of the logarithm of the ratio of predicted to 
true luminosity is presented in the lower panels. The range of the x-axis for the lower panels is set to the 1^* and 99**^ percentiles of the 
distribution for bursts, and the histograms are normalized to give ^ , = 1. 



soured in the optical and UV, but which have substantial 
emission in the IR. This IR emission is the result of the 
heating of the dust when it absorbs starlight and conse- 
quently emits the energy at longer wavelengths. Further 
studies made it clear that an important fractio n of the star 
formation in the Univer s e is obscured by du st (jSmail et al.l 
ll997l : lHauser et al.lll998l : lHughes et al.lll998h . These discov- 
eries made clear the importance of dusty galaxies for un- 
derstanding how galaxies are made. Any complete model of 
galaxy formation must be able to make accurate predictions 
for the emission from galaxies at IR wavelengths. 

This section focuses on the mid-IR emission from galax- 
ies. We predict the luminosities in the MIPS 24 /im band as 



used in rece nt observations in the infrared by the satellite 
Spitzer (see lLacev et al.ll2008l . for detailed comparisons be- 
tween the GALFQRM plus GRASIL model predictions and obser- 
vational data). We selected a galaxy population at z — 0.5. 

Fig. [12] shows the ratio of the predicted to true lumi- 
nosity at rest-frame 24 fim, for quiescent and burst galax- 
ies. The associated statistics are summarized in Table IIUI 
As we found in the previous section, the ANN method per- 
forms better for quiescent galaxies; at rest-frame 24 fim, 
the root mean square logarithmic error is el ~ 0.13 and 
^'|e|<io% ~ 87%. For burst galaxies, the performance is not 
as good. The percentage of galaxies with errors within 10% is 
60%. Different ANN architectures and input galaxy proper- 
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Figure 12. The errors on the observer frame 24 fim luminosity predicted by the ANN for galaxies a.t z = 0.5. The results for quiescent 
and burst galaxies are shown in the left and right panels respectively. The solid and dashed lines in the upper panels have the same 
meaning as in Fig. |4] The lower panels show the distribution of the logarithmic error. The range of the x-axes are set to the 1"* and 99*^ 
percentiles of the distribution, and the histograms are normalized to give = 1. 



Sample el P\e\<l0% ™™ Qi Q2 Q3 max 

Quiescent 0.13 86.5 -38.4 -2.4 -0.1 1.9 25.7 
Bursts 0.18 60.0 -79.3 -8.6 -0.4 7.3 37.5 



Table 10. Summary statistics for the distribution of the error on the 24 ^m luminosities predicted by the ANN, for quiescent and burst 
galaxies at z = 0.5. A description of the quantities is given in Tabled 



ties were tried without any improvement over these figures. 
As before, the error distribution is not correlated with lumi- 
nosity. This suggests that the errors might not change the 
shape of the luminosity function. Also, we found no correla- 
tion between the error and several other galaxy properties. 



In Fig. 1131 we compare the rest frame 24 /im luminosity 
functions at z — 0.5, constructed from the luminosities pre- 
dicted by the ANN and the original luminosities as given by 
GRASIL. As with the UV comparison in the previous section, 
the luminosity function derived from the ANN predictions 
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Figure 11. The rest frame 0.17 fira luminosity function at 2 = 3, 
calculated from the original luminosities (solid lines), as extracted 
from the GRASIL spectra, and the luminosity function constructed 
from the ANN predicted luminosities (dashed lines). The black 
lines show the total luminosity function, whose components are 
quiescent galaxies (green lines) and burst galaxies (red lines) . The 
error bars on the model luminosity function indicate the Poisson 
uncertainties due to the number of galaxies simulated. 




10 

log(i/Lyh-2 Lg) 



Figure 13. The 24 /xm luminosity function at z = 0.5, given by 
original luminosities (solid lines) extracted from the GRASIL spec- 
tra, and the luminosity function constructed from the luminosities 
predicted by the ANN (dashed lines). The black lines show the to- 
tal luminosity function, with quiescent galaxies show by the green 
line and burst galaxies by the red line. The error bars indicate 
the Poisson uncertainties due to the number of galaxies in each 
luminosity bin. 



is in very good agreement with that obtained directly from 
GRASIL. Again, this success extends to the luminosity func- 
tion of the burst sample, even though the errors are larger 
in this case. 



5.3 Submillimetre Galaxies at z — 2 

Submillimetre galaxies ( SMGs) are t hought to be predom- 
inantly dusty starbursts (Smail 2002). The emission in the 
submillimetre region of the spectrum (around 850 /im) is 
due to the heating of dust when it absorbs the UV hght 
emitted by young stars. It is also possible that some con- 
tribution to the flux at these wavelengths comes from the 
dust being heated by an AG N, although it is now believe d 
that this process is secondary (|Alexander et al.ll2003l . |2005| ) . 
In the standard picture, SMGs are galaxies with prodigious 
star formation rates ~ 500 — 1000 Mq yr"'^. In the Baugh 
et al. model, a top-heavy IMF is adopted in merger-driven 
starbursts. As a result, the model SMGs have more mod- 
est star formation rates. Nevertheless, the model SMGs are 
still the most massive galaxies in place with the highest star 
formation rates at the median redshift z ^ 2. 

Here we predict the observer frame 850 fim luminosities 
for galaxies at 2; = 2, using the ANN. We compare our pre- 
dictions with the correct values extracted from GRASIL spec- 
tra, and evaluate the luminosity functions using both the 
predicted and original luminosities. 

The predicted and original 850/im luminosities are com- 
pared in Fig. 1141 Further information about the errors is 
presented in Table [TT] As shown in the previous section, the 
ANN predictions for the submillimetre are extremely good. 
We are able to reproduce the luminosity of more than 95% 
of galaxies with an accuracy of 10% or better, for both quies- 
cent and burst galaxies. The success of the predictions is also 
reflected in the root mean square logarithmic error, which 
is 0.09 for quiescent galaxies and 0.10 for the burst sample. 
No clear correlation was found between the error and galaxy 
properties. 

The luminosity function in the observer frame 850 
at z = 2 is plotted in Fig. 1151 Similar to what we found 
for the UV and mid-IR bands, the submillimetre lumi- 
nosity functions calculated using the predicted luminosi- 
ties are virtually indistinguishable from the functions con- 
structed from the original 850 fim luminosities (extracted 
from GRASIL spectra), for the whole luminosity range. 



6 PREDICTING GALAXY COLOURS 

In this section we look at the performance of the ANN 
when predicting joint luminosity or colour distributions. 
In particular, we apply the ANN to the prediction of 
UV-submillimetre, mid-IR-submillimetre and UV-mid-IR 
colours, for a sample of GALFORM galaxies extracted at z — 0. 
We define colour as Lband 1 /Lband 2 , where Lband = {L„), and 
the () brackets denote an average over the filter response of 
the passband. We predict colours by training the ANN in- 
dependently for each band. Hence, to predict a colour, two 
networks were trained, one for each luminosity. As we noted 
in Section 14.41 this procedure produces better results than 
predicting both luminosities simultaneously using two out- 
put neurons. 
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Figure 14. The ratio of predicted to true luminosity at 850 fim in the observer frame. Quiescent galaxies are shown in the left-hand 
panels, while the results for the burst sample are plotted in the right-hand panels. The solid and dashed lines in the upper panels have 
the same meaning as in Fig. |4] The distribution of the logarithm of the ratio is presented in the lower panels. The range of the x-axes 
are set to the 1^^ and 99**^ percentiles of the distribution, and the histograms are normalized to give ~ ^' 



Sample 




^|e|<10% 


min 


Qi 


Q2 


Q3 


max 


Quiescent 
Bursts 


0.09 
0.10 


97.2 
93.2 


-11.7 
-27.1 


-1.6 
-3.2 


0.0 
-0.2 


1.6 
3.1 


13.1 
21.4 



Table 11. Summary statistics for the predicted 850 fim luminosity error distribution, for quiescent and burst galaxies, at redshift 2 = 2. 
A description of these quantities can be found in Table ^ 



In the upper panels of Fig. [TH] we show the compari- 
son between the predicted and true colours. The sample of 
galaxies used is defined in terms of stellar mass for quies- 
cent galaxies and the stellar mass produced in the most re- 
cent burst for starbursts, as described above, and hence, as 



such, is not intended to match a particular observational 
selection. From left to right we plot the 0.17/im-850/im, 
24/im-850/xm and 0.17/im-24/im colours. The error distri- 
butions are summarized in Table [TH This plot reveals that 
the ANN performs remarkably well when predicting colours. 
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Figure 16. The colour predicted by the ANN plotted against tlie true color for a sample of GALFORM galaxies extracted at redsliift z = 0. 
From left to right, we plot the 0.17/im— 850/im, 24/im-850/im and 0.17/jm-24/im colours. In the lower panels, we show the distribution 
of colour. The blue solid-shaded histogram represents the true distribution, with the contribution from quiescent galaxies shown by the 
hatched histogram. The dashed line, unfilled histograms show the colours predicted by the ANN. 



Colour Sample sl ^'|et<io% Pi Qi Q2 Q3 P99 





Quiescent 


0.14 


78.9 


-42.1 


-3.7 


0.0 


3.8 


34.4 




Bursts 


0.41 


57.0 


-474.5 


-8.6 


0.7 


8.8 


67.9 




Quiescent 


0.16 


81.1 


-70.7 


-3.5 


0.2 


3.3 


36.4 


.£'850 


Bursts 


0.52 


58.7 


-733.1 


-7.8 


0.8 


8.2 


58.0 


.£^0.17 


Quiescent 


0.16 


78.3 


-40.3 


-3.2 


0.2 


4.2 


33.8 


L24 


Bursts 


0.27 


55.4 


-103.8 


-9.4 


0.3 


8.4 


59.6 



Table 12. Statistics of the error distribution associated with the prediction of colours at 2: = 0, using ANN. A description of the 
quantities is given in Table [T] 



For quiescent galaxies, we find more than 78% of the sam- 
ple have colours within 10% of the true colour. As noted 
in previous sections, the ANN is not capable of achieving 
such a performance for burst galaxies, with only ~ 55% 
of galaxies possessing predicted colours within 10% of the 
expected values. Nevertheless, the distributions of the pre- 
dicted colours, shown by the dashed, unshaded histograms 
in the lower panels of Fig. 1161 are very similar to the true 
distributions (represented by the solid-shaded histograms) . 

Fig. US] shows that at redshift z = 0, GALFORM predicts 
that most of the galajcies have a 0.17/^m-850/im colour in 
the range 0.001-1, with a median ~ 0.04. In this plot, the 
hatched histogram represents the contribution of the quies- 
cent galaxies to the total colour distribution. We see that 
quiescent galaxies present redder UV-submillimetre colours 



(smaller luminosity ratios) than the burst population, by a 
factor of ~ 8. The distribution of the mid-IR-submillimetre 
colours shows a bi-modality, with one peak at ~ 0.38 and 
the second at ~ 6.3. This double-peaked distribution is a 
consequence of the different nature of quiescent and burst 
galaxies. The plot indicates that GALFORM modelled quiescent 
galaxies have distinctly redder 24/im-850/^m colours than 
burst galaxies. In the bottom-right panel of Fig. [16] we plot 
the distribution of UV-mid-IR colour for galaxies at z; = 0. 
Model galaxies show colours between ~ 0.001-0.1, with a 
median around 0.01. Both quiescent and burst galaxies dis- 
play similar distributions, with the former showing slightly 
bluer colours. 
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Figure 15. The 850 fj,m observer frame luminosity function 
at 2 = 2. The luminosity functions calculated from the original 
GRASIL luminosities are shown by solid lines, while the ANN pre- 
dicted values are plotted using dashed lines. The green and red 
lines show the luminosity function for the quiescent and burst 
galajcies, respectively, which are the components of the full sam- 
ple, represented by the black line. The error bars indicate the 
Poisson uncertainties due to the number of galaxies simulated. 



7 THE OVERLAP BETWEEN UV AND 

SUBMILLIMETRE SELECTED GALAXIES 

The star formation history of the Universe has been probed 
at high redshift using samples select ed in the optical 
and at subrnillime tre wavelengths (e.g. ISmail et all Il997l : 
ISteidel et al. II2OO3). Samples constructed in the optical are 
sensitive to emission in the rest-frame UV at redshifts 2 > 2. 
The UV flux is very sensitive to dust extinction. Hence, to 
estimate the true star formation density from such observa- 
tions it is necessary to apply a large extinction correction to 
the observed flux. This problem does not apply to samples 
constructed at submillimetre wavelengths. However, there 
are two different problems to overcome in this case: How 
much of the dust heating is due to extincted starlight and 
how much arises from AGN emission? What is the conver- 
sion from submillimetre flux to total infrared luminosity? 
The completeness of optically selected samples with regard 
to measuring the star formation density has been called into 
question. The possibility has been put forward that heavily 
extincted star formation could be completely missed in opti- 
cally selected samples. To resolve these issues it is important 
to establish the o verlap between optically and submillimetre 
selected samples (jAdelberger fc Steidelll200ol ). 

In this section, we shed some hght on this problem 
by using the ANN to predict the optical magnitudes and 
optical-sub-mm colour distributions of galaxies selected 
from their sub- mm fluxes at redshift z = 2. Investigating in 
detail the overlap between UV-selected and submillimetre- 
selected samples requires a large galaxy sample, which can 
only be realistically generated using the ANN approach. We 
choose redshift z = 2 for this comparison because it is the 
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Figure 17. Luminosity functions of sub-mm and rest-frame UV 
selected galaxies at z = 2. Upper panel; luminosity functions at 
observed wavelengths of 850 fira (solid line) and 450 fira (dashed 
line). Lower panel: luminosity function in observed R-band, cor- 
responding to rest-frame UV, with apparent R-band magnitude 
shown on bottom axis and rest-frame absolute AB magnitude 
shown on the top axis. In the lower panel, we distinguish between 
the total R-band luminosity function (dotted line), and that for 
the subsets of galaxies with 850^m fluxes Sv (850 ^m) ^ 1 mJy 
(solid line) and Sv{850 fim) ^ 5 mJy (dashed line). 

typical redsh ift measured for galax ies in current faint sub- 
mm surveys (jChapman et ahlbOOSl ). One caveat to be borne 
in mind in this comparison is that the current version of our 
model does not follow the heating of the dust by AGN, so all 
the submillimetre radiation is emitted by the reprocessing 
of UV starlight by dust. 

We first plot in Fig. [17] the luminosity functions at ob- 
served wavelengths of 450 fim and 850 fim (top panel) and 
in the observed _R-band, which at 2 = 2 corresponds to the 
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rest-frame far-UV, A = 0.23 (bottom panel). The inte- 
grated number density of sub-mm galaxies (SMGs) is listed 
in Table [T3I As expected, the plot shows that bright galaxies 
are rarer than fainter galaxies: submillimetre galaxies with 
flux densities 5^(850 /^m) 1 mjy are ~ 10 times more 
abundant than galaxies with 5i/(850 pim) ^ 5 mJy. For these 
brighter galaxies, we calculate a space density of 7.9 x 10~^ 
Mpc~^, which is of the same order as the value estimated 
observationally bv lBlain etaP (j2004l ) of 2.7 x lO"'^ Mpc"^. 
At a given flux, submillimetre selected galaxies at 450 
are more abundant than their 850/im counterparts, with the 
exception perhaps of galaxies brighter than 50 mJy. In our 
simulation, we find approximately one submillimetre galaxy 
with a 450 /xm flux density greater than 1 mJy in every 220 
Mpc^. 

The bottom panel of Fig. [17] reveals that, in our model, 
a large fraction of SMGs should be detectable in current 
deep optical surveys. For example, around half of the SMGs 
with S,/(850/im) ^ 5 mJy are predicted to be brighter than 
Aab = 25, which is similar to the magnitude limit used by 
ISteidel et all (|2004h in their survey for star-forming galaxies 
at z ~ 2 using BX and EM colour selection on the rest- 
frame UV emission from these galaxies. (Note that for the 
adopted cosmology, /?ab = 25 corresponds to an absolute 
magnitude of A/ab — 5 log h — —19.1 at this redshift.) We 
find a median magnitude i?AB = 25.2 for SMGs at z — 2 
with 51/(850 /.im) ^ 5 mJy. Thi s seems quite co n sisten t with 
the observational values from [Chapman et al.l ()2005l ). who 
found a median Rab = 25.4 for a sample of radio-detected 
SMGs with 5^(850 /im)> 5 mJy. This panel also reveals an- 
other important result: only « 1% of all the galaxies brighter 
than Rab = 25, are predicted to have 850/im flux densities 
brighter than 5 mJy. 

In Fig.[l8]we plot the distribution of observer-frame R- 
band - sub-mm (850 ^m) colours predicted by the ANN, 
for galaxies in the Millennium Simulation at 2; = 2. We 
plot the colour distributions for SMGs with fiux densities 
5*^(850 /im) ^ 1 and ^ 5 mJy in the two panels. For galax- 
ies with 5*^(850 /im) ^ 1 mJy, we find a median colour 
of {R)/S^ (850 fim) 4 x 10"". Brighter SMGs with 
5*1, (850 ^m) 5 mJy display a colour distribution which 
is on average ~ 10 times redder, with a median colour of 
4 X 10~^. The colour distributions are also seen to be very 
broad, especially for the brighter sub-mm flux limit, which 
covers a range ~ 10^ in colour. We also show (as hatched 
histograms) the colour distributions which result for each 
sub-mm fiux limit if we further select only galaxies with op- 
tical magnitudes brighter than J?ab < 25. This shows how 
we lose the redder part of the optical-sub-mm colour distri- 
bution with this optical selection. 



8 DISCUSSION AND CONCLUSIONS 

In this paper we have introduced a new method to rapidly 
predict accurate spectral energy distributions over a wide 
wavelength range from a small number of galaxy proper- 
ties, using artificial neural networks (ANN). Granato et al. 
(2000) combined the GALFORM semi-analytical galaxy forma- 
tion code with the spectro-photometric code GRASIL. The 
use of GRASIL allows a more comprehensive and accurate 
treatment of the effect of dust on the SED of the galaxy, 




log S„/S, 



Figure 18. The distribution of observer-frame R-band — sub- 
millimetre (850 fim) colours predicted by the ANN, for galax- 
ies in the Millennium Simulation at z = 2. The colour is ex- 
pressed as a flux ratio S^{R) / Sv {850 fim). The top and bottom 
panels shows the distributions for galaxies with sub-mm flux den- 
sities. Si/ (850 /im), brighter than 1 and 5 mJy respectively. In each 
panel, the filled histogram shows the full distribution, while the 
hatched histogram shows the contribution to this from galaxies 
which are also brighter than i?AB = 25 in the optical. 



predicting the dust emission in the mid- and far-IR regions, 
as well as improving the accuracy of the predicted spectra 
in the UV. Unfortunately, GRASIL takes several minutes to 
run for each galaxy, which prohibits the direct application of 
this code to populate large dark matter simulation volumes 
with galaxies. The ANN provides a fast, simple and flexible 
means to calculate accurate galaxy spectra based on GRASIL. 
Here, we have carried out the flrst tests of the method and 
present applications to galaxy luminosities and colours. 

The ANN is trained using a sample of galaxies for which 
GRASIL has been run to compute spectra. We found that 
the ANN approach performs well when predicting galaxy 
spectra from galaxy properties. The best performing ANN 
architecture we found is a simple supervised, feed-forward 
net, composed of 12 input galaxy properties, two hidden lay- 
ers with 30 neurons each, and one output neuron. The ANN 
works best when predicting the luminosity at one wavelength 
at a time, rather than the whole spectrum. Due to the in- 
herent variety in the spectra of galaxies which are undergo- 
ing a burst of star formation, or which recently underwent 
a burst, we found it best to train the ANN separately for 
samples of quiescent and bursting galaxies. The ANN needs 
to be trained at each redshift of interest and for each set of 
GALFORM plus GRASIL parameters. The luminosities predicted 
by the ANN agree remarkably well with those computed di- 
rectly using GRASIL. In the observer frame 850/im at z = 2, 
over 90% of the ANN predicted luminosities lie within 10% 
of the true luminosities calculated directly from GRASIL. The 
ANN works somewhat less well in the UV and mid-IR. Nev- 
ertheless, at all the wavelengths considered we flnd that the 
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Sample 


450 


850 /im 


850 /xm and -Rab 25 mag 




(10^5 


(10-^ Mpc-3) 


(10-5 Mpc-3) 


Galaxies with Si, ^ 0.5 mjy 


773.9 


211.2 


124.6 


Galaxies with S,^ ^ I mJy 


456.3 


83.5 


53.8 


Galaxies with S,^ ^ 5 mJy 


45.9 


7.9 


3.9 



Table 13. The space density of submillimetre galaxies in the Millennium Simulation at 2 = 2. We distinguish between galaxies with 
5^(450 /im) or 5,^(850 /im) ^ 0.5 mJy, Imjy and 5mjy respectively. In the third column, we further limit our sample by only considering 
those galaxies brighter than Rab < 25 mag. The number densities in the table are quoted in units of 10"^ Mpc"^. 



luminosity functions predicted by the ANN are in excellent 
agreement with those computed directly with GRASIL. 

The ANN also performs well when predicting the 
colours of galaxies. In this case, the ANN is trained for each 
band individually. Given this success, we applied the ANN 
to investigate the overlap between samples of rest-frame UV 
and sub-mm selected galaxies at z — 2. This problem is 
ideally suited to the ANN approach, as it requires a large 
sample of galaxies covering a wide range of luminosity. Al- 
though we predict that 50% of bright submillimetre sources 
(850/im flux greater than 5 mJy) should have optical mag- 
nitudes brighter than Rab < 25, these SMGs make up only 
a small fraction of an optically selected sample at the same 
magnitude limit. In an optically selected sample of galaxies 
at 2; = 2 brighter than Rab < 25, 10% are predicted to have 
an 850 nm flux brighter than 1 mJy and 1% are expected to 
be brighter than 5 mJy. These predictions seem consis tent 
with recent observational constraints (e.g. Chapman e t al.l 
I2OO5I ). 

The success of our new ANN approach in generating 
accurate predictions of the spectral energy distributions of 
large samples of galaxies means that we can now produce 
mock catalogues of galaxies for forthcoming surveys such 
as the Herschel ATLAS survey, which will cover 600 square 
degrees in five far-infrared bands, and the SCUBA-2 Cos- 
mology Legacy Survey, which will cover around 40 square 
degrees to a fainter flux limit in the sub-mm. In a companion 
paper we apply the ANN technique to populate the Millen- 
nium Simulation with galaxies with accurate sub-mm fluxes 
to make predictions for the clustering of dusty galaxies. 
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